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FIELD OF THE INVENTION 

This invention relates to the field of bacterial molecular biology and, in particular^ to genetic engineering 

attain desired expression levels of novel toxins having agrononmc value. 



BACKGROUND OF THE INVENTION 



^^LSlt. <1987> Proc Nat Acad. Sci. USA 84:7036-7040; U.S. Patent Application 108,285, filed October 
tf»in aene was contained on a 5.9 kd bam ni Tragmem vyi^"' ^ / «„.,_-, /nop"* that pn- 

re Tt.^r V le* JS'TlSw SSXp— « o. =h,menc genes containing .he entire coding «. 
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a 35S promote, a vira, (TMV) ^ ■ « » ™u.ncV Unde, these conditions 

* ~ Thte leve ' - e *" ession s,i " ,ep ' es,n,s a 

peptide is disturbed in truncated polypeptides. 55:303- 
A number of researchers have attempted to express plan " J^SS, and E. colUFuza- 

317; Rothstein et al. (1987) Gene 55:353-35e ^M"?*^ 1987) 

due in part to oodon usage bias, since <,-gl,ad,n "^^^^^^^n glycinin A2 (Fuzakawa 

plants to check the effects on the level of expression. 
35 SUMMARY OF THE INVENTION 

a^:r=^^ 

,u,e hai-pins and RNA splice sites, in the synthetic genes, code™ , to specify a g^enam 
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■ « , the XTAcodon is avoided in both monocots and dicots. The synthetic genes of this invention 

in dicots whereas the XTAcodon is avoiueu H pfinPri in the Detailed Description closely ap- 

more than about 10-15%. «tandard technoloqy known to the art. The 

Assembly of the Bt gene of this invent.cn -s P^°™ ed ^ n s ?^?^^2diment is enzymatically 

crystal protein in having toxicity to the same insects. 
BRIEF DESCRIPTION OF THE FIGURES 

the addition of 1 3-amino acids at the C-term.nus mn , tnjction of the synthetic Btt gene. Segments A 

Figure 2 represents a simplified scheme used ^^^^^^^^^ havingunique 

and ligated to form the desired DNA segment. 

DETAILED DESCRIPTION OF THE INVENTION 

The following definitions are provided in order to provide clarity as to the intent or scope of their usage in 

the Specification and claims. tronc | atinn of a structural qene to yield the encoded protein. 

Expression refers to the transcription and translate of a 9 e y ^ jn , ants than 

=,n=^^ 

^-^^^^^^^^^•^^"^ 

o, transcript. Prompter ^~-™^ e ^ sites* RNA poty- 

downstream gene. In .^^^ZrT raters *ve transcription preferential in the 

function. . cim(ho< . i( . nf a oro tein A qene embodies the struc- 

in a gene are the 3' end and P o.y(A)* additionsequences. se gment encoding a protein, polypeptide or a 
Structural gene is that P ort l0 n of a gene comprising a DNA segment e g „ ^ 
5 portion thereof, and excluding the 5' sequence which d "^ t ^ , ^ | " £ ce , lula r location wherein 

may be one which is normal* found m * e «^^ q ^^X~2 9-n. may be derived in who.e 
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the art, nucleotide mismatches can occur at the th rd » ^ We ^ J e ^ f ^u^, (e . g ., substitutions, in- 
substitutions in the final polypeptide sequence. Also, ^ ^^^^^^^ insignificant 
sertions or deletions) In certain regions of the gene »!^^ *^£ a * Z f unctionally of the 
whenever such modifications result in changes ,n ^^^^f * of. gene sequences can 
final product. Ithas been shown thatche^^ 

replace the corresponding reg.ons ,n the ^^^^3 of cross -hybridization of nucleic acids 
sequences may be identified by ^e sk,„ed inHames and Higgens (eds.) 

, SSSS I^TSSS STeJi of homology ,s often measured in terms of 

5 XZZSZS^ * «*• Bt — since they are ex - 

pressible at a higher level in plants than native ^ 9 en ~- hQSt ^ ,„ usage of 

Fr . T .^ »f purred codon usage refers to * e of Lge of a particular codon 

nucleotide codons to specify a given am.no J^d b y ? he 'total I number of occurrences 

a in a gene, the number of occurrences of that codon , n the gene » dlv de J re of codon 

of all codons specifying the same ammo aod ,n th ^ ^2^7^ ' e nes whosTsequences are publicly available. 

usage for Bt genes, which was obtained by analys.s ^.^^^J^^^aJ^ averaging f re- 

Similarly, the frequency of preferred ^^J^^^^Zt cell. It is'preferable that 

quency of preferred codon usage in a large number of genes ^pressea y example , give s the f re- 

,5 L analysis be limited to genes that are highly expressed ^ J^^^^^^yl^lonou. 

puency of codon usage by highly expressed "-"T^,^ obtained from 

r^ncesobtained™^ 

by a r=:=^ 

theticBtt gene, whose sequence is SjT'^ 
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from that of dicot plants is 7.8%. In genera, terms the overa.. average deviation of the codon usage of a syn- 
thetic gene from that of a host cell is calculated using the equat.on 

X "~ Y " x 100 
n = 1-Z— ^ 

ssrrb^^^ 
:™fd;e,^ 

the original source. nf nwA means that the component nucleotides were 

^^^^ 

available machines. tn a Ipvel of expression of a designed 

™' "™- f^m^S^ZT^^ s^t b/guantified in Northern 
gene wherein the amount of , * J^™^ ™I"xp,eLd corresponding to greater than or egual to ap- 
blots and. thus, represents a level ol s P e =™ * transcribed at a level wherein the 

proximately 0.00,% of the ^.^t to t£SS ^«^^ 
amount of specific mRNAproduced » insuf toent o £ ^<*"^ l £ b . „ ly expressed no , ori , y allows 

parasternals formed In strains «^ ^" di ^ 

' S "oa SniiJCSnL been shown that the approximate,, ,32 KDa protein . a protoxm 

,^£££i3hU depending on the strain o, B^^^^^.,,,..*^* 

5 thatar^,^^ 

-Cu'thls invents is based on ~ni,,on that— 
protein in,ransgenicp,an,soanbe = edv« 

conversely, detection of these stab teed RNA '' a " sc " p ' s ,„ e Mern o( | ow expre ssion of insec 

evaluating parameters such as ^^J^Q^^^ sun -ounding the initiation ATG codon. 

50 RNAIeader. addition of intron sequence and mod. .cat.o '^J^ ^ of Bt protei n expression in 

To date, changes in these parameters have not ed * nhance modificatio ns in 

plants. Applicants find that, surpns.ngb £ ^^J^^^^Mp, ca n be studied using site- 
the coding region of the gene were e ™° .ynthetlc DNA duplexes containing the de- 
specific mutagenesis by replacement of restnct.on ^j*™*^*^* However , rec ent advances^ 

55 sired nucleotide changes (Lo et al. ^^^^^^^^ gene designed specifi- 
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,ein been expressed ,n c Zr,g regl0 „ and assembling the gene from chemically 

designing an improved mJCleot.de sequen ce l» l e con g g naturally-occurring 
s y n,hes fe ed d igonu d eo,,de = 

gene ^'^^^^^s which would result in improved expression of the synthetic gene ,n 
ZH ™T.o op«: £ iT e^enc, o, translation, codons preferred in highly expressed prote.ns 

"''^codo^lln genes ^^^f^^^^T^^ 

,e,„ encoded by that gene. 

the highly expressed yeast gene PGK1 results in a de ^ease ^ formaintaining mRNA 

^ y :rwirdrx x z^ 

this value, and not sufficiently high as to cause destebteahon of RNA '^^^ isa bout55%. 
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" t^a^r.^nomte group exhibit simi.arities in codon choice, regardless of the function 
of these genes Thus an estimate of the overall use of the genetic code by a taxonom,c group can be obtained 
by summing cod^ frequencies of all its sequenced genes. This species-specific codon ch oice .s ^reported n 
this inTntion from analysis of 208 plant genes. Both monocot and dicot plants are analyzed .nd.v.dually to 
de^n^e^ these broader taxonomic groups are characterized by different patterns of synonymous 
determine wnetner in inc | ud ed in the codon analysis code for proteins havmg a w.de range 

and the dicots In general, the most important factor in discriminating between monocot and chart patterns of 
"do! "sage ^ is the Percentage G + C content of the degenerate third base. In monocots, 16 of 18 am.no acids 
favor G + C in this position, while dicots only favor G + C in 7 of 1 8 amino acds. 

The G ending codons for Thr, Pro, Ala and Ser are avoided m both monocots and dice* because they 
nnntiin r in codon position II The CG dinudeotide is strongly avoided in plants (Boudraa (1987) Genet. Sel. 

(O-nth-n et al. (1985) Bu.l. .nst. Pasteur ^«>«; £ 
c — ■ ' . .. . . .„„ In H.^otc Yea is alwavs the least favored codon, while in monocots this 

indices to Quantify CG and TA doublet avoidance in codon positions II and III. XCG/XCC is the ratio of codons 
a ngca^ 

wtth T as the second base. These indices have been calculated for the plant data in thn paper (Table 2) and 
Support the conclusion that monocot and dicot species differ in their use of these d.nucleot.des. 



Avoidance of CG and TA doublets in codons Potion II-III. 
XCG/XCC and XTA/XAA values are multiplied by 100. 



Group 



Soy- RuBPC 
Plants Dicots cots Maize bean SSU CAB 



Mono- 



Yrr/yrr 40 30 61 67 37 18 22 
XCG/XCC 4U 9 13 
XTA/XTT 37 35 47 « A 



RuBPC SSU = ribulose 1,5 bisphosphate small subunit 
CAB = chlorophyll a/b binding protein 

Additionally for two species, soybean and maize, species-specific codon usage profiles were calculated 
(not shown) The ma ze codon usage pattern resembles that of monocots in general, since these sequences 
n^eTc^iEof the monocot sequences available. The codon profile of the maize subsample ,s : even 
represent over naii o f G+c , n posi tion III. On the other hand, the soybean codon 

™»rr»os^^.s-,P-^. 

sohosohate small subunit (RuBPC SSU) and chlorophyll a/b binding protein (CAB) ,s more biased than that o 
IZ genes in Tenet cin usage profiles for subsets o( these genes ,19 and ,7 sequences. respeCvel,) 
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* u v Tt,= D„apr I and CAB Dooled samples are characterized by stronger avoid- 
w e,e c^'"^ ^^Tta^C?™^ and dL sanies (Tabie 2). Although .no* o, 

in coTn chot Therefore, .he codon choices of individual genes for RuBPC SSU and CAB were tabulated 

than are those of the d.cot spec es. ™» ™ * ssu as we „ as tw o other highly expressed genes 

^ ~s~ These genes almost completely avoid the use 

of MTTn codon position M. although this codon bias was not as pronounced in non-leaf prote.ns such as alcohol 
dlh^enase zein 22 kDa sub unit, sucrose synthetase and ATP/ADP translocator. S.nce the wheat SSU 
anS CAB aenes have a similar pattern of codon preference, this may reflect a common monocot pattern for 

e a AATAAA, polymerase II termination sequence, e.g., CAN (7 . 9) AGTNNAA, utuuubo na v h 
con^uTTpiice sites are highlighted and, if present in the native Btt cod.ng sequence, are mod.fed so as 

"^tmLCst^™ 

■ ESo=,^ 

construct the synthetic gene by a corresponding region of natura BJse^ 

bases S ngle stLded oligonucleotides are annealed and ligated to form DNA segments. The 
» "trudfng cohesive ends in complementary ohgonucleotide segments is four ^/^^^^ 
evolved for gene synthesis, the sites designed for the joining of ol.gonuc.eot.de p.eces and DNA segments 

di Tt^c^^ 

The nucleo de sequence of each fragment is determined at this stage by the d.deoxy ^method usmg the re 
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'"Structural gene under promoter control can be introduced into plant tissue ** n \^^**°Z* 
skilled in he art. The technique used for a given plant species or specific type o plant fssue depends _o the 

Est™ a ^ -f"f Ao^ln V lan ,n.na.ic Engineering , New York: Cambridge University Press, p. 

Skill tn one* ^'preferred embodiments the invention disclosed herein comprises expression in plan. :ce»s ; of 

r rt Se==r;r.= 

protein arfd The pS Is ataavs present minimi crop damage ,ha. „o ul d others ,es U l. from preapp,,- 
cation larval foraging. 
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This invention combines the specif ic teachings of the present disclosure w.th a ZScSf. 
expedients known in the art. The choice of expedients depends on variables such as he cho.cs of '"^cdal 
orotetf ro J , a Bt strain, the extent of modification in preferred codon usage, manipulate of »equen««a^. 

sites within the design of the synthetic gene to allow future nucleotide modif .cations, addition of mt s o 

aZ ~d" synthetic genes ,0, desired proteins having gnomic vaiue. 

?3und^n of «.e present invention is the ability to synthesize a novel gene coding for an inseo- 

Lidl pnSrllgned so ha'th. protein will be expressed a, an enhanced level in plants, ye, so that « w,ll 
re«in Sa.« property of insect toxicity and retain or increase its specific ,nsec..c,da, act,v„y. 



EXAMPLES 



The following Examples are presented as illustrations of embodiments of the present invention. They do 
not limit the scope of this invention, which is determined by the claims. M „. hr «n 

Te lowing strains were deposited with the Patent Cu.ture Collection, Northern Reg.onal Research Cen- 
ter, 1815 N. University Street, Peoria, Illinois 61604. 

Strain n^ posited on Accession # 

E. coli MC1061 6 October 1987 NRRL B-18257 

(p544-HindIII) 

E. coli MC1061 6 October 1987 NRRL B-18258 

(p544Pst-Met5) 

The deposited strains are provided for the convenience of those in the art, and are not necessary to practice 
XUe i^snSon. which may be practiced with the present disclosure in combination with publicly available 
protocdrwormation, and mate'rials. E. coH MC1061, a good hostfor plasmid transformations, was disclosed 
by Casadaban. M.J. and Cohen, S.N. (1980) J. Mol. Biol. 138:179-207. 

Example 1 : Design of the synthetic insecticidal crys tal protein gene, 
(i) Preparation of toxic subclones of the Btt gene 

Construction, isolation, and characterization of P NSB544 is disclosed by Sekar V. et al. (1 987) Pro. NatK 
Acad Sci USA 84:7036-7040. and Sekar, V. and Adang, M.J., U.S. patent appl.cat.on serial no. 108,285 f,led 
O^ober " 3 1987T which is hereby incorporated by ^^^^^^^.^^SSf. 
orotein qene of pNSBP544 is inserted into the Hindi" site of P IC-20H (Marsh, J.L. et al. 1984) Gene 32 481 
485) ^"hereby yie^g a plasmid designated p544-Hindl". which is on deposit. Expression ,r , E «* yield > a 
'fkDa crista, protein in addition to the 65 kDa species characteristic of the crystal proton obtained from Btt 

iS0 ' a" 9 kb P BamH. fragment carrying the crystal protein gene is removed ^^^^f^, 
BamH.-linearizld"plC-20H DNA. The resu.ting plasmid, P 405/44-7, is digested with Bgl' and 
Moving Bacillus sequences flanking the 3'-end of the crystal protein ^ ^' tm ^ a ^,; ^ 
2 is dige^eTwlth Pst. and religated, thereby removing Bacilkis sequences flanking the 5 -en o the ^crystal 
, L ohn„i 1 *"rThn from the 5'-end of the crystal protein structural gene. The resulting plasmid, p405/81- 
^st^^nS 
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SD MetThrAla 
5 ' CA GG AT C C AAC AATG ACTG CA 3 • 
3 • GT ACGTCCT AGGTTGTT ACTG 5 ' 
Sph I EstI 



fSD indicates the location of a Shine-Dalgarno prokaryotic ribosome binding site.) The resulting I Plasmid 
S44PS -Met5 on^a n a structural gene encoding a protein identical to one encoded by P^BP5jWe«.pt 
for a de eUon of the amino-terminal 47 amino acid residues. The nucleotide sequence of the Btt cod.ng region 
n P 544Pst-Me 5 is presented in Figure 1. In bioassays (Sekar and Adang, U.S. patent appl.cat.on senal no. 
10^285 supra) the proteins encoded by the full-length Btt gene in pNSBP544 and the N-term.nal de.et.on 
derivatfve^Pst lets, were shown to be equally toxic. All of the p.asmids mentioned above have the.r crys- 
tal protein genes in the same orientation as the lacZ gene of the vector. 

(in Modification of preferred codon usage 

Table 1 presents the frequency of codon usage for (A) dicot proteins, (B) Bt proteins. (C) the synthetic BtJ gene, 
Ind Id) r n oltp r o Lins. Although some codons for a particular amino acid are utilized to aPProx-ate y the 
same exTent by both dicot and Bt proteins (e.g., the codons for serine), for the most part, the A*nbubon , of 
codon frequency varies significantly between dicot and Bt proteins, as illustrated in columns A and B ,n Table 
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Table 1- Frequency of Codon Usage 



Amino 
Acid 



niKtribn *- inn Fraction 



(A) Dicot 
Genes 



(B)Bt 
Genes 



(C) Synthetic 
Btt Gene 



(D)Monocot 
Genes 



Gly 
Gly 
Gly 
Gly 

Glu 
GlU 
Asp 
Asp 

Val 
Val 
Val 
Val 

Ala 
Ala 
Ala 
Ala 

Lys 
Lys 
Asn 
Asn 

Met 
lie 
He 
He 

Thr 
Thr 
Thr 
Thr 

Trp 
End 
Cys 
Cys 

End 
End 
Tyr 
Tyr 



GGG 


0. 


12 


0. 


08 


GGA 


0- 


37 


0. 


53 


GGT 


0. 


35 


0. 


24 


GGC 


0. 


16 


0. 


16 


GAG 


0 


52 


0 


13 


GAA 


0 


48 


0 


87 


GAT 


0 


57 


0 


68 


GAC 


0 


43 


0 


32 


GTG 


0 


.30 


0 


.15 


GTA 


0 


. 12 


0 


.32 


GTT 


0 


. 38 


0 


.29 


GTC 


0 


.20 


0 


.24 



GCG 
GCA 
GCT 
GCC 

AAG 
AAA 
AAT 
AAC 

ATG 
ATA 
ATT 
ATC 



0.05 
0 . 26 
0.42 
0. 28 

0. 61 
0.39 
0.45 

0. 55 

1 . 00 
0. 19 
0. 44 
0 .36 



0.12 
0.50 
0.32 
0.06 

0.13 
0.87 
0.79 

0. 21 

1. 00 
0.30 
0.57 
0.13 



ACG 


0. 


07 


0. 14 


ACA 


0. 


27 


0. 68 


ACT 


0. 


36 


0. 14 


ACC 


0. 


31 


0.05 


TGG 


1. 


00 


1. 00 


TGA 


0 


46 


0.00 


TGT 


0 


43 


0.33 


TGC 


0 


57 


0.67 


TAG 


0 


. 18 


0.00 


TAA 


0 


.37 


1.00 


TAT 


0 


. 42 


0.81 


TAC 


0 


. 58 


0. 19 



0. 13 
0.37 
0.34 
0. 16 

0. 52 
0.48 
0 . 56 
0 .44 



0. 


30 


0. 


10 


0. 


35 


0. 


25 


0. 


06 


0. 


24 


0. 


41 


0 


29 


0 


.58 


0 


.42 


0 


.44 


0 


.56 


1 


.00 


0 


.20 


0 


.43 



0.37 

0.07 
0.27 
0.34 
0.32 

1.00 
0.00 
0.33 
0.67 

0.00 
1.00 
0.43 
0.57 



0.21 
0.18 
0.21 
0.40 

0.77 
0.23 
0.31 
0.69 



0.34 

0.20 
0. 16 
0.28 
0.36 

0.87 
0.13 
0.23 
0.77 

1.00 
0.09 
0.27 
0.64 

0. 18 
0. 14 
0.22 
0.47 



1.00 
0.34 
0.27 
0.73 



0.44 
0.22 
0.19 
0.81 
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Table 1 (CONTINUED) 

Distribution Fraction 



10 



25 



35 



Amino 




(A) Dicot 


(B)Bt 


(C) Synthetic 


(D)Monocot 


Acid 








Btt Gene 


Genes 


Phe 


TTT 


0.45 


0.75 


0.44 


0. 28 


Phe 


TTC 


0.55 


0.25 


0. 56 


0.72 


Ser 


AGT 


0. 14 


0.25 


0. 13 


0 . 07 


Ser 


AGC 


0. 18 


0.13 


0. 19 


0 .25 


Ser 


TCG 






0.06 


0 . 13 


Ser 


TCA 


0. 18 


0. 19 


0.17 


0. 13 


Ser 


TCT 


0.26 


0.25 


0.27 


0. 18 


Ser 


TCC 


0. 19 


0. 10 


0. 17 


0.24 


Arg 


AGG 


0.22 






0.28 


Arg 


AGA 


0.31 


0. 50 


0.32 


0.08 


Arg 


CGG 


0. 04 


0.14 


0.05 


0. 14 


Arg 


CCA 


0. 09 


0. 14 


0.09 


0.04 


Arg 


CGT 


0.23 


0.09 


0.23 


0 . 11 


Arg 


CGC 


0.11 


0.05 


0.09 


0.36 


Gin 


CAG 


0.38 


0 . 18 




0.43 


Gin 


CAA 


0.62 


0.82 


0.61 


0.57 


His 


CAT 


0.52 


0.90 


0.50 


0. 38 


His 


CAC 


0.48 


0. 10 


0.50 


0. 62 


Leu 


TTG 


0. 26 


0.08 


0 . 27 


0 . 15 


Leu 


TTA 


0. 10 


0.46 


0.12 


0.04 


Leu 


CTG 


0. 09 


0. 04 


0. 10 


0.27 


Leu 


CTA 


0. 08 


0.21 


0. 10 


0.11 


Leu 


CTT 


0. 29 


0.15 


0. 18 


0.16 


Leu 


CTC 


0. 19 


0.06 


0.22 


0.27 


Pro 


CCG 


0.07 


0.20 


0. 08 


0.20 


Pro 


CCA 


0. 44 


0.56 


0.44 


0.39 


Pro 


CCT 


0. 32 


0.24 


0.32 


0. 19 


Pro 


CCC 


0. 16 


0.00 


0. 16 


0.22 



Bt coding sequences publicly available and 88 coding 
sequences of dicot nuclear genes were used to compile the 
codon usage table. The pooled dicot coding sequences, 
obtained from Genbank, were: 
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TABLE 1 (CONTINUED) 

fIF.NOS / SPECIES 



GENBANK 



PROTEIN 



Antirrhinum majus 
Arabidopsis thaliana 



Bertholletia excelsa 

Braaaica campestria 
Brassica napus 
Brassica oleacea 
Canavalia ensiformis 
Carica papaya 
Chlamdomonas 
reinhardtii 



Cucurbita pepo 
Cucumis sativus 



Caucus carota 

Dolichos biflorus 
Flavor ia minervia 
Glycine max 



AMACHS Chalcone synthase 

ATHADH Alcohol dehydrogenase 

ATHH3GA Histone 3 gene 1 

ATHH3GB Histone 3 gene 2 

ATHH4GA Histone 4 gene 1 

ATHLHCP1 CAB 

ATHTUBA a tubulin 

5-enolpyruvyl4hif ate 
3-phosphate synthase 1 
High methionine storage 
protein 2 
Acyl carrier protein 3 
BNANAP Napin 

BOLSLSGR S-locus specific glycoprotein 
CENCONA Concanavalin A 

CPAPAP Papain 

CREC552 preapocytochrorae 

CRERBCS1 RuBPC small subunit gene 1 

CRERBCS2 RuBPC small subunit gene 2 

CUCPHT phytochrome 

CUSGMS Glyoxosomal malate synthetase 

CUSLHCPA CAB 

CUSSSU RuBPC small subunit 

DAREXT Extensin 

DAREXTR 33 kD extensin related protein 

DBILECS seed lectin 

FTRBCR RuBPC small subunit 

SOY7SAA 75 storage protein 

SOYACT1G Act in 1 

SOYCIIPI Cll protease inhibitor 

SOYGLYA1A Glycinin Ala Bx subunits 

SOYGLYAAB Glycinin A5A4B3 subunits 

SOYGLYAB Glycinin A3/b4 subunits 

SOYGLYR Glycinin A2Bla subunits 

SOYHSP175 Low M W heat shock proteins 
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TABLE 1 (CONTINUED) 
r.FNUS /SPECIES GENBANK 



PROTEIN 



Gossypium hlrsatum 
Helianthus annus HNNRUBCS 



Ipomoca batatas 
Lemna gibba 

Lupinus luteus 

Lycopersicon 

esculentum 



SOYLGBI Leghemoglobin 
SOYLEA Lectin 
SOYLOX Lipoxygenase 1 

SOYNOD20G 20 kDa nodulin 

SOYNOD23G 23 IcDa nodulin 

SOYNPD24H 24 kDa nodulin 

SOYNOD26B 26 kDa nodulin 

SOYNOD26R 26 kDa nodulin 

SOYNOD27R 27 kDa nodulin 

SOYNOD35M 35 kDa nodulin 

SOYNOD75 7 5 kDa nodulin 

SOYNODR1 Nodulin C51 

SOYNODR2 Noduline E27 

SOYPRP1 Proline rich protein 

SOYRUBP RuPBC small subunit 

SOYURA Urease 

SOYHSP26A Heat shock protein 26A 

Nuclear-encoded chloroplast 4 

heat shock protein 

22 kDA nodulin 5 

01 tubulin 6 

02 tubulin 6 
m Seed a globulin (vicilin) 7 

Seed fl globulin (vicilin) 7 
RuBPC small subunit 
2S albumin seed storage 
protein 6 
Wound- induced catalase 5 
LGIAB19 CAB 

LGIR5BPC RuPBC small subunit 

LUPLBR leghemoglobin 1 

TOMBIOBR Biotin binding protein 

TOMETHYBR Ethylene biosynthesis protein 

TOMPG2AR Polygalacturonase-2a 

TOMPSI Tomato photosystem 1 protein 
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TABLE 1 (CONTINUED) 
GENUS / SPECIES GENBANK 



PROTEIN 



TOMRBCSA 

TOKRBCSB 

TOMRBCSC 

TOMRBCSD 

TOMRRD 

TOMWIPIG 



Hedicago satira ALFB3R 
Mesembryan themum 
crystallinum 
Nicotiana 

plumbaginifolla TOBATP2 1 



Nicotiana tabacum TOBECH 
TOBGAPA 
TOBGAPB 
TOBGAPC 
TOBPR1AR 
TOBPR1CR 
TOBPRPR 
TOBPXDLF 
TOBRBPCO 
TOBTHAUR 

Perseua americana AVOCEL 
Petroselinum 

hortense PHOCHL 



RuBPC small subunit 
RuBPC small subunit 
RuBPC small subunit 
RuBPC small subunit 
Ripening related protein 
Wound induced proteinase 
inhibitor I 

Wound induced proteinase 
inhibitor II 

CAB 1A 10 
CAB IB 1° 
CAB 3C 10 
CAB 4 11 
CAB 5 11 
Leghemoglobin III H 

RuBPC small Bubunit 12 

Mitochondrial ATP synthase 
/3 subunit 

Nitrate reductase 13 
Glutamine synthetase 14 
Endochitinase 

A subunit of chloroplast G3PD 

B subunit of chloroplast G3PD 

C subunit of chloroplast G3PD 

Pathogenesis related protein la 

Pathogenesis related protein lc 

Pathogenesis related protein lb 

Peroxidase 

RuBPC small subunit 

TMV-induced protein homologous 

to thaumatin 

Cellulase 

Chalcone synthase 
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TABLE 1 (CONTINUED) 

GENBANK 



(1FNTIS I SPECIE 



PROTEIN 



Petunia sp 



PETCAB 1 3 

PETCAB22L 

PETCAB22R 

PETCAB 2 5 

PETCAB 3 7 

PETCAB91R 

PETCHSR 

PETGCR1 

PETRBCS08 

PETRB'CSll 



Phaseolus vulgaris PHVCHM 

PHVDLECA 

PHVDLECB 

PHVGSR1 

PHVGSR2 

PHVLBA 

PHVLECT 

PHVPAL 

PHVPHASAR 

PHVPHASBR 



Pisum sativum 



PEAALB2 

PEACAB80 

PEAGSR1 

PEALECA 

PEAXiEGA 

PEAROBPS 

PEAVIC2 

PEAVIC4 

PEAVIC7 



CAB 13 
CAB 22L 
CAB 22R 
CAB 25 
CAB 37 
CAB 91R 

Chalcone synthase 

GLycine-rich protein 

RuPBC small subunit 

RuPBC small subunit 

70 kDA heat shock protein 1' 

Chitinase 

Phytohemagglutinin E 
Phytohemagglutinin L 
Glutamine synthetase 1 
Glutamine synthetase 2 
Leghemog lob i n 
Lectin 

Phenylalanine ammonia lyase 

a phaseolin 

p phaseolin 

Arcelin seed protein 

Chalcone synthase 

Seed albumin 

CAB 

Glutamine synthetase (nodule) 

Lectin 

Legumin 

RuBPC small subunit 

Vicilin 

vicilin 

Vicilin 

Alcohol dehydrogenase 1 
Glutamine synthetase (leaf) 
Glutamine synthetase (root) 
Hi stone 1 
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TABLE 1 ( CONTINUED) 

CENUS / SPECIES GENBANK 



PROTEIN 



REF 



Raphanus satlvus 

Ricinus communis RCCAGG 

RCCRICIN 
RCCICL4 

Silene pratensis SIPFDX 
SIPPCY 

Sinapls alba SALGAPDH 
Solanum tuberosum POTPAT 

POTINHWI 



Spinacla oleracea SPIACPI 
SPIOEC16 



SPIPCG 
SPIPS33 



Vicia faba 



VFALBA 
VFALEB4 



Nuclear encoded chloroplast 4 

heat flhock protein 

RuPBC smallsubunit 21 

Agglutinin 

Ricin 

Isocitrate lyase 
Ferrodoxin precursor 
Plaatocyanin precursor 
Nuclear gene for G3PD 
Patatin 

Wound- induced proteinase 
inhibitor 

Light-inducible tissue specific 
ST-LS1 gene 

Wound- induced proteinase 

inhibitor II 

RuBPC small subunit 

Sucrose synthetase 22 

Acyl carrier protein I 

16 kDa photosynthetic 

oxygen-evolving protein 

23 kDa photosynthetic 

oxygen-evolving protein 

Plastocyanin 

33 kDa photosynthetic water 
oxidation complex precursor 
Glycolate oxidase 23 
Leg hemoglobin 
Legumin B 

Vicillin 24 



Pooled 53 monocot coding sequences obtained from Genbank (release 55) 
or, when no Genbank file name is specified, directly from the published 
source, were: 
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TABLE 1 ( CONTINUED) 



r.FNtlS /SPECIES GENBANK 

Avena sativa ASTAP3R 
Hordeum vulgare BLYALR 
BLYAMY 1 
BLYAMY 2 
BLYCHORD1 
BLYGLUCB 
BLYHORB 
BLYPAPI 
BLYTH1AR 
BLYUBIQR 



Oryza sativa ricui^uio 

Trltxcum aestivum WHTAMYA 
WHTCAB 
WHTEKR 
WHTGIR 
WHTGLGB 
WHTGLIABA 
WHTGLUTI 
WHTH3 
WHTH4091 
WHTRBCB 

Secale cereals RYESECGSR 



PROTEIN 



Zea mays 



MZEA1G 

MZEACTIG 

MZEADH11F 

MZEADH2NR 

MZEALD 

MZEANT 

MZEEG2R 



Phytochrome 3 
Alcurain 
a amylase 1 
a amylase 2 
Horde in C 
6 qlucanase 
Bl hordein 

Amylaee/protease inhibitor 
Toxin a hordothionin 
Ubiquitin 

Histone 3 25 
Leaf specific thionin 1 26 
Leaf specific thionin 2 26 
Plastocyanin 27 
Glutelin 

Glutelin 28 

a amylase 

CAB 

Em protein 

gibberellin responsive protein 
7 gliadin 

a/£ gliadin Class All 
High MW glutenin 
Histone 3 
Histone 4 

RuBPC small subunit 
7 secalin 

40.1 kJDA Al protein (NADPH- 
dependent reductase) 
Act in 

Alcohol dehydrogenase 1 
Alcohol dehydrogenase 2 
Aldolase 

ATP/ AD P translocator 
Glutelin 2 
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TABLE 1 (CONTINUED) 

GENBANK 



r.PNTTS /SPECIES 



PROTEIN 



MZEGGST3B 
MZEH3C2 
MZEH4C14 
MZEHSP701 



Glutathione S transferase 
Histone 3 
Histone 4 

70 kD Heat shock protein, 



MZEHSP702 70 kD Heat shock protein, 

exon 2 
MZELHCP CAB 

MZEMPL3 Lipid body surface protein L3 

MZEPEPCR Phosphoenolpyruvate carboxylase 

MZERBCS RuPBC small subunit 

MZESUSYSG Sucrose synthetase 

MZETP12 Triosephosphate isomerase 1 

MZEZEA20M 19 *£> * eLn 

MZEZEA30M 19 kX> zein 

MZEZE15A3 15 kD zein 

MZEZE16 16 W) zein 

HZEZE19A 19 kD zein 

MZEZE22A 22 kd zein 

MZEZE22B 22 kD zein 

Catalase 2 2< 
Regulatory CI locus 3' 
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Table 1 (CONTINUED) 

Bt codons were obtained from analysis of coding sequences 
of the following genes: Bt var. kurstaki HD-7 3, 6 . 6kb 
Hindlll fragment (Kronstad et al. (1983) J- Bacteriol . 
154 : 419-428) ; Bt var. kurstaki HD-1, 5.3 kb fragment (Adang 
et al. (1987) in Biotechnology in Invertebrate Pathology 
and Cell Culture , K. Maramorosh (ed.) , Academic Press, Inc. 
New York, pp. 85-99); Bt var. Kurstaki HD-1, 4.5 kb 
fragment (Schnepf and Whiteley (1985) J. Biol. Chem. 
260:6273-6280); and Bt var. tenebrionis , 3.0 kb Hin dlll 
fragment (Sekar et al. (198 7) Proc. Natl. Acad. Sci. 
84 :7036-7040) . 
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Table 1 (CONTINUED) 

17. Ryder, T . B - et al- (1587) Mol . Gen. Genet. 210:219- 
233 . 

n , . t ,1 (1987) J- Mol. Biol. 195 : 115- 

18. Llewellyn, D.J. et ai- 

123. 

19. Tingey, S.V. etal. (1987) EMBO J. 4:1-9. 

20. Gantt, J.S. and Key, J.L. (1987) Eur. J. Biochem. 
166 :119-125. 

21. Guidet, F . and Fourcroy, P. (1988) Hucl . Acids Res. 
16:2336. 

22. Salanoubat, M. and Belliard, G. (1987) Gene 60:47-56. 
23 VoloKita, M. and Son>erville, C.R. (1987) J. Biol. 

Chera. 262:15825-15828. 

24. Bassner, R. etal. (1987) Hucl. Acids Res. 15:9609. 

25. ChojecXi, J- (1986) Carlsberg Res. Commun. 51:211-21.7. 

26 . Bohl»ann, H. and Apel, K. (1987) Mol. Gen. Genet. 
207:446-454. 

27. Kieisen. P.S. and Causing. K. (1987, FE8S I«tt. 
225:159-162. 

28. Higuchi, «• and Fukazava, C. (1987) Gene £S_:245-253. 

29. Bethards, L. A. et al. (1987) Proc. Natl. Acad. Sci. 
USA §4:6830-6834. 

30. Paz-Ares, J • etal- (1987) EMBO J - 6:3553-3558. 



For example, dicots utilize the AAG codon for lysine with a frequency of 61% and ^ e 

aulncv of 39% In contrast, in Bt proteins the lysine codons AAG and AAA are used with a frequency of 13£ 

Milk dico, proteins, in designing thesyntheticBt, gene, note,, ^'^^T^ 
nal Bt oene are replaced by GCT; instead, only some alanine codons are changed to GCT while others are 

expression* monocot plants. In Table 1. column D, is presented .he frequency of codon usage of highly ex- 
Pre i:™metge S ne,a.e nature o, the genetic code, only part of the variation contained in a gene is 
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monocots and dicots. These patterns are also dist.nct from ahose - — ' e „ kaIy „te S 

Ingener^the^ntcodonusagep— 
than unicellular organisms, duo to the ^<»£*™? '° r ~ amino ad<js as , na , reported for a sample of 

nr^sT™^^^^ 

genes and In chloroplasts. ^^^*^ he u e o, eL« "V chloro'plast-encoded pro- 
Since chloroplasts have restncted the,r tRNA genes, . me use o p o( 

,„ plants have distinct coding regies. ^«^^ 1 '^^ , 2 d TXS In this sample share 
plast usage, sharing the most commonly used , coder for on y 1 oH ^m chtorop , as , codon 

oenesalthougMhecodonsP^ 

7749 report that codon . .sage ,n 165 1 ™" p ^ ^ ^ ^ se|eak>n .„ 

increased codon bias. Bennetzen and Hall (1982) abundance of isoaccepting tRNAs In 

yeast. Codon usage in these highly ^^^ S ^^„Zt yeast and E. coli mRNA codon usage 
both yeas, and E coll. I. has been f^^^^^^^^^c,^ protein,, 
to isoacceptor tRNA abundance promotes hig ' '™"^ , ^, a ™ ^lant genes in yeast or E. coii is limited 

genes in yeast and other systems. 
, ,m seouences within the Bit coding region h svlno potentially destabilizing influences . 
\„a,ys,so,,he^ S ,e r a,s.h r ^ 

c to be AT-rich. These observations lead to the T^^^^^y in designin g a synthetic Btt gene, 
region may be contributing to a low ^^^^^^^^^ pLt "proteins. As i.lustrated 

genes. The synthetic Btt gene of this invention has an A + T content of 55 /o. 

Table 3. 



Adenine + 


Thymine Content in Btt Coding Region 






Base 




%G+C 


%A+T 


Coding region 


G 


A 


T 


C 






Natural Btt gene 




341 


633 


514 


306 


36 


64 


Synthetic Btt gene 




392 


530 


483 


428 


45 


55 
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dialed end The selection of the polyAsite is presumed to becis-regu.ated. Dunng expres *™ ^f^" 
and RNA i different plants the present inventors have observed that the polyadenylated mRNA -solated 

the svnthetic Btt gene when scanned for 0 mismatches of the sequences. 

b poC-asTTi'termination sequence, CAN 7 _gAGTN NAA. This sequence was shown (Vanka and F..,- 
JuIL MQfltt EMBO J 7-791-799) to be next to the 3' end of the coding reg.on of the U2 snRNA genes 
o^bS 

(c) CUUCGG hairpins, responsiuie MqM , Pmc Nat , Acad S ci. 85:1364-1368). The exceptional 

^"perTolding. of comple* RNAa.™c. U r»s. CUUCGG hairpin are notfound w«h e,.har 0 or 

v « f^i MQ«wFMRn I «5-2749-2758 Consensus sequences for the 5 and 3 splice junc 

Thus, by highlighting potential RNA-destabilizing sequences, the synthetic Btt gene is designed to elim- 
inate known eukaryotic "egu.atory sequences that effect RNA synthesis and processing. 

F^mnle 2. Chemical synthesis of a modified Btt structural gene 

(i) Synthesis Strategy 

DEFGHIJKLM, representing the entire coding region of the BJt gene co „ m „ n t<, in order to effect 

Figure 3 outlines in more detail the strategy used in combining .ndividual DNA segments ; in order to effec 

" 'Z %£Z£ZZ%* M) -ad ,o con S .r UC , .he syn.ha.ic 9 ana vary in size. Oiioon^de pairs 

at appropriate splice sites is detailed in Figure 3. 
(ii) Preparation of oligodeoxynudeotides 

Preparation of oligodeoxynudeotides for use in the synthesis of a DNA sequence comprising a gene for 
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f ran me agarose ^^^^ by that used in M segment ligations. M segment DNA is brought 

dideoxy method of Sanger et ah (1977) Proc. Natl. Acad. Sc.. 74:5463-5467. 
(Hi) Synthesis of Segment AM 
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In Table 4, bold lines demarcate the individual oligonucleotides. Fragment A1 contains 71 bases, / 
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to standard procedures as described above. The overall nucleot.de sequence of segment M ,s. 
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,n Tabie 5 bold lines demarcate the individua. oligonucleotides. Segment M 

destroyed EcoRI. restriction sites at both ends. (Additional n»tnct»n sites wrthm segment M are md.cated). 
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Se 9 n,en, M is inserted into vecor ^^J^^^S^tf^^ Seamenl 

The resulting plas^Td of 4.44 kb confers resistance to kanamycin on E. cgh. 
Example 3 Expression of synthetic crystal prot pin aene in bacterial systems 

:3SSSS5=" ===== 

selves have potential value as microbial insecticides, product of the synthefc gene. 
Fygmnle 4. Expression of a synthetic crys tal protein gene in plants 

available that utilize plant promoters .ncludmg ^ n ^' e ^™ ^ lns in p ,ants. These cassettes 
hornworms. 

Example 5 . Assay for insecticidal activity 

Bioassays »er. conduced essentia,* as described by Seka, V. e< |. suj^Toxi city ;»» "^^^ 
esU ma ,eonUo,Pias« 
" ^^^^aTST^SSTSS expressed in piants under iden.icai conditions. ce.s 

Btt genes on Northern blot assays. 



1. A synthetic gene designed to be highly expressed in plants comprising a DNA-qu.no. encoding an in- 
secticidal protein which is functionally equivalent to a native msect.cdal prote.n of Bt. 

2. A synthetic gene of claim 1 wherein said DNA sequence is at least about 85% homologous to a native 
insecticidal protein gene of Btt. 

3. Asyndetic gene of daim 1 wherein said DNAsequence is that presented in Figure 1 .spanning nucleotide. 
1 through 1793. 
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4. Asynthetic gene of claim 1 wherein said DNAsequence is that presented in Figure 1 spanning nucleotides 
1 through 1833. 

5 Asynthetic gene of claim 1 wherein theoverall frequency of preferred codon usage within * « .entire coding 
£3Tn of said synthetic gene is within about 75% of the frequency of codon usage preferred ,n plants. 

6 Asynthetic gene of claim 1 wherein the A+T base content of said DNAsequence is substantially eo.ua. to 
the A+T base content found in plant structural genes. 

7. Asynthetic gene of claim 1 wherein a plant initiation sequence is presentatthe 5' end of the coding region. 

8 A synthetic gene of claim 1 wherein plant polyadenyla-tion signals comprising those having AATAAA, 
l^TGAA AATAAT, AATATT. GATAAA. GATAAA and AATAAG motifs, are ehmmated ,n sa>d DNA se- 
quence. 

9. A synthetic gene of daim 1 wherein the polymerase II termination sequence, CAN^GTNNAA. is elim- 
inated in said DNA sequence. . 

10. Asynthetic gene of claim 1 wherein CUUCGG hairpins are eliminated in said DNAsequence. 

11 A synthetic gene of claim 1 wherein plant consensus splice sites, including 5'=AAG:GTAAGT and 3'- 
i T T?T(Pu)TTT(Pu)T(Pu)T(Pu)T(Pu)TGCAG:C, are eliminated in sa,d DMA sequence. 

1 2 A synthetic gene of claim 1 wherein the CG and TA doublet avoidance indices are substantially equal to 
that of highly expressed genes in the selected host plant. 

13. A recombinant DNA cloning vector comprising said synthetic gene of claim 1. 

14 A plant cell which contains the synthetic gene of claim 1. 

secticidal Protein of Bt such that said synthetic gene is expressed m sa.d plant host. 
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A|B |C | D |E jF | G |H il l J | K I L I M 

CODING REGION OF SYNTHETIC BH GENE 
DIVIDED INTO 13 SEGMENTS 

SEGMENT A 

| SEGMENT M 
SEGMENT AM 

| SEGMENT BL 
SEGMENT ABLM 

| SEGMENT CK 
SEGMENT ABCKLM 




SEGMENT F 




SEGMENT ABCDJKLM 




SEGMENT FH 



SEGMENT ABCDEIJKLM 



SEGMENT G 



SEGMENT FGH 



SEGMENT ABCDEFGHUKLM 



FIG. 2 
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O .I .2 .3 .4 -5 .6 .7 .8 .9 LO U L2 I.3 I.4 L5 L6 I.7 I.8 
I94 138 22 H 0 212 251 193 12 136 83 43 239 



208 



FIG. 3 
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